Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Dasgupta, Sanjoy; Mandt, Stephan; Li, Yingzhen (Ed.)Spatial evolutionary games are used to model large systems of interacting agents. In earlier work, a method was developed using Bayesian Networks to approximate the population dynamics in these games. One advantage of that approach is that one can smoothly adjust the size of the network to get more accurate approximations. However, scaling the method up can be intractable if the number of strategies in the evolutionary game increases. In this paper, we propose a new method for computing more accurate approximations by using surrogate Bayesian Networks. Instead of doing inference on larger networks directly, we do it on a much smaller surrogate network extended with parameters that exploit the symmetry inherent to the domain. We learn the parameters on the surrogate network using KL-divergence as the loss function. We illustrate the value of this method empirically through a comparison on several evolutionary games.more » « lessFree, publicly-accessible full text available May 2, 2026
-
Li, Yingzhen; Mandt, Stephan; Agrawal, Shipra; Khan, Emtiyaz (Ed.)Free, publicly-accessible full text available May 15, 2026
-
Li, Yingzhen; Mandt, Stephan; Agrawal, Shipra; Khan, Emtiyaz (Ed.)Many real-world situations allow for the acquisition of additional relevant information when making decisions with limited or uncertain data. However, traditional RL approaches either require all features to be acquired beforehand (e.g. in a MDP) or regard part of them as missing data that cannot be acquired (e.g. in a POMDP). In this work, we consider RL models that may actively acquire features from the environment to improve the decision quality and certainty, while automatically balancing the cost of feature acquisition process and the reward of task decision process. We propose the Active-Acquisition POMDP and identify two types of the acquisition process for different application domains. In order to assist the agent in the actively-acquired partially-observed environment and alleviate the exploration-exploitation dilemma, we develop a model-based approach, where a deep generative model is utilized to capture the dependencies of the features and impute the unobserved features. The imputations essentially represent the beliefs of the agent. Equipped with the dynamics model, we develop hierarchical RL algorithms to resolve both types of the AA-POMDPs. Empirical results demonstrate that our approach achieves considerably better performance than existing POMDP-RL solutionsmore » « lessFree, publicly-accessible full text available May 5, 2026
-
Li, Yingzhen; Mandt, Stephan; Agrawal, Shipra; Khan, Emtiyaz (Ed.)Free, publicly-accessible full text available May 3, 2026
-
Li, Yingzhen; Mandt, Stephan; Agrawal, Shipra; Khan, Emtiyaz (Ed.)Free, publicly-accessible full text available May 3, 2026
-
Li, Yingzhen; Mandt, Stephan; Agrawal, Shipra; Khan, Emtiyaz (Ed.)Off-policy evaluation (OPE) is one of the most fundamental problems in reinforcement learning (RL) to estimate the expected long-term payoff of a given target policy with \emph{only} experiences from another behavior policy that is potentially unknown. The distribution correction estimation (DICE) family of estimators have advanced the state of the art in OPE by breaking the \emph{curse of horizon}. However, the major bottleneck of applying DICE estimators lies in the difficulty of solving the saddle-point optimization involved, especially with neural network implementations. In this paper, we tackle this challenge by establishing a \emph{linear representation} of value function and stationary distribution correction ratio, \emph{i.e.}, primal and dual variables in the DICE framework, using the spectral decomposition of the transition operator. Such primal-dual representation not only bypasses the non-convex non-concave optimization in vanilla DICE, therefore enabling an computational efficient algorithm, but also paves the way for more efficient utilization of historical data. We highlight that our algorithm, \textbf{SpectralDICE}, is the first to leverage the linear representation of primal-dual variables that is both computation and sample efficient, the performance of which is supported by a rigorous theoretical sample complexity guarantee and a thorough empirical evaluation on various benchmarks.more » « lessFree, publicly-accessible full text available May 3, 2026
-
Li, Yingzhen; Mandt, Stephan; Agrawal, Shipra; Khan, Emtiyaz (Ed.)We study the problem of causal effect estimation in the presence of unobserved confounders, focusing on two settings: instrumental variable (IV) regression with additional observed confounders, and proxy causal learning. Our approach uses a singular value decomposition of a conditional expectation operator combined with a saddle-point optimization method. In the IV regression setting, this can be viewed as a neural network generalization of the seminal approach due to Darolles et al. (2011). Saddle-point formulations have recently gained attention because they mitigate the double-sampling bias and are compatible with modern function approximation methods. We provide experimental validation across various settings and show that our approach outperforms existing methods on common benchmarks.more » « lessFree, publicly-accessible full text available May 3, 2026
-
Li, Yingzhen; Mandt, Stephan; Agrawal, Shipra; Khan, Emtiyaz (Ed.)Free, publicly-accessible full text available May 3, 2026
-
Li, Yingzhen; Mandt, Stephan; Agrawal, Shipra; Khan, Emtiyaz (Ed.)Network Markov Decision Processes (MDPs), which are the de-facto model for multi-agent control, pose a significant challenge to efficient learning caused by the exponential growth of the global state-action space with the number of agents. In this work, utilizing the exponential decay property of network dynamics, we first derive scalable spectral local representations for multiagent reinforcement learning in network MDPs, which induces a network linear subspace for the local $$Q$$-function of each agent. Building on these local spectral representations, we design a scalable algorithmic framework for multiagent reinforcement learning in continuous state-action network MDPs, and provide end-to-end guarantees for the convergence of our algorithm. Empirically, we validate the effectiveness of our scalable representation-based approach on two benchmark problems, and demonstrate the advantages of our approach over generic function approximation approaches to representing the local $$Q$$-functions.more » « lessFree, publicly-accessible full text available May 3, 2026
-
Li, Yingzhen; Mandt, Stephan; Agrawal, Shipra; Khan, Emtiyaz (Ed.)Optimization problems with norm-bounding constraints appear in various applications, from portfolio optimization to machine learning, feature selection, and beyond. A widely used variant of these problems relaxes the norm-bounding constraint through Lagrangian relaxation and moves it to the objective function as a form of penalty or regularization term. A challenging class of these models uses the zero-norm function to induce sparsity in statistical parameter estimation models. Most existing exact solution methods for these problems use additional binary variables together with artificial bounds on variables to formulate them as a mixed-integer program in a higher dimension, which is then solved by off-the-shelf solvers. Other exact methods utilize specific structural properties of the objective function to solve certain variants of these problems, making them non-generalizable to other problems with different structures. An alternative approach employs nonconvex penalties with desirable statistical properties, which are solved using heuristic or local methods due to the structural complexity of those terms. In this paper, we develop a novel graph-based method to globally solve optimization problems that contain a generalization of norm-bounding constraints. This includes standard ℓp-norms for p∈[0,∞) as well as nonconvex penalty terms, such as SCAD and MCP, as special cases. Our method uses decision diagrams to build strong convex relaxations for these constraints in the original space of variables without the need to introduce additional auxiliary variables or impose artificial variable bounds. We show that the resulting convexification method, when incorporated into a spatial branch-and-cut framework, converges to the global optimal value of the problem. To demonstrate the capabilities of the proposed framework, we conduct preliminary computational experiments on benchmark sparse linear regression problems with challenging nonconvex penalty terms that cannot be modeled or solved by existing global solvers.more » « lessFree, publicly-accessible full text available May 1, 2026
An official website of the United States government
